Paragraph-based Prosodic Cues for Speech Synthesis Applications

نویسندگان

  • Mireia Farrús
  • Catherine Lai
  • Johanna D. Moore
چکیده

Speech synthesis has improved in both expressiveness and voice quality in recent years. However, obtaining full expressiveness when dealing with large multi-sentential synthesized discourse is still a challenge, since speech synthesizers do not take into account the prosodic differences that have been observed in discourse units such as paragraphs. The current study validates and extends previous work by analyzing the prosody of paragraph units in a large and diverse corpus of TED Talks using automatically extracted F0, intensity and timing features. In addition, a series of classification experiments was performed in order to identify which features are consistently used to distinguish paragraph breaks. The results show significant differences in prosody related to paragraph position. Moreover, the classification experiments show that boundary features such as pause duration and differences in F0 and intensity levels are the most consistent cues in marking paragraph boundaries. This suggests that these features should be taken into account when generating spoken discourse in order to improve naturalness and expressiveness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mandarin spontaneous narrative planning - prosodic evidence from national taiwan university lecture corpus

This paper discusses discourse planning of pre-organized spontaneous narratives (SpnNS) in comparison with read speech (RS). F0 and tempo modulations are compared by speech paragraph size and discourse boundaries. The speaking rate of SpnNS from university classroom lecture is 2 to 3 times to that of RS by professionals; paragraph phrasing of SpnNS is 6 times that of RS. Patterns of paragraph a...

متن کامل

Prosodic Cues in Multimodal Speech Perception

Potential visual prosodic cues for prominence and phrasing comprising eyebrow movements were manipulated using a system for audio-visual text-to-speech synthesis which has been implemented based on the KTH rule-based synthesis. Two functions of prosody (prominence and phrasing) were tested in two separate experiments. A test sentence, ambiguous in terms of an internal phrase boundary, was used ...

متن کامل

Prosodic mapping of text font based on the dimensional theory of emotions: a case study on style and size

Current text-to-speech systems do not support the effective provision of the semantics and the cognitive aspects of the documents’ typographic cues (e.g., font type, style, and size). A novel approach is introduced for the acoustic rendition of text font based on the emotional analogy between the visual (text font cues) and the acoustic (speech prosody) modalities. The methodology is based on: ...

متن کامل

Discourse prosody context - global F0 and tempo modulations

The present study is a corpus analysis of discourse prosodic information using two different types of fluent continuous Mandarin speech. Global F0 heights and duration patterns of withinand between-paragraph phrases were compared by discourse positions. Results showed that overall phrase-level F0 height was paragraph-initial>-medial>-final while the tempo pattern was paragraph-initial<-medial<-...

متن کامل

Prosodic and Spectral iVectors for Expressive Speech Synthesis

This work presents a study on the suitability of prosodic and acoustic features, with a special focus on i-vectors, in expressive speech analysis and synthesis. For each utterance of two different databases, a laboratory recorded emotional acted speech, and an audiobook, several prosodic and acoustic features are extracted. Among them, i-vectors are built not only on the MFCC base, but also on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016